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Abstract- The insidious and omnipresent scenery of the network united with mounting con^as^about 
computer-generated intimidation stipulate instant solutions for securing the network commfTOioaflons. By 
equipping the software agents with the knowledge and by investing meta-data informatio<fslCrces we can 
provide the users with effectual information Services. So far, the investigation in neto^fk s^urity first and 
foremost focused on securing the information rather than securing the infrastruffl^ itself. Given the 
widespread intimidation state of affairs, there is a gripping want to enlarge arcru*eKJes, algorithms, and 
protocols to apprehend a trustworthy network infrastructure. In order t<peH&h this aspiration, the 
foremost and leading step is to develop an ample perceptive of the security thrq^'and existing solutions as 
stated in this paper. A way of building Ontologies that proceeds iijWB&ttftm-up fashion is presented, 
defining concepts as clusters of concrete XML objects. Clusters artUJ^r generated, which are formed 
based on the structure of the input XML documents. The learnin^^iain is a more general concept of 
security and health care system. On today's global information injijjjJtaJcture, manual knowledge extraction 
is often not an option due to the sheer size and the high rate ofl^fige of available information. A bottom- 
up method for ontology extraction and maintenance intenckAaft impeccably harmonizing current ontology 
design practice, where, as a rule, Ontologies are designedtfsScrown. 



I. lottpcPuction 




Conventionally, the loom to community-orj^Wed ontology building has been a mutual one aimed at 
valorizing the involvement of each comnwrHty member in the knowledge creation activity. Metadata 
extraction and merging is carried out b/FTayh by individual users as a part of their daily activities, possibly 
taking sample data items into acc&rit\rowever, some drawbacks do exist. Cooperatively building and 
maintaining ontology takes a •ttibstSptial amount of time, even in the presence of a careful process 
management. For this reason, /(Wit research twisted once again too automatic and semiautomatic (as 
opposed to cooperative) meffflshrconstruction and maintenance. Automatic techniques for building and 
integrating metadata hav«ewrstudied for many years by the database Community. More recently, several 
techniques specifically*™^ at learning Ontologies from text corpora or database repositories were 
proposed. s^Cs 



In security-sei*5!i^e contexts, the ontological connection between data and its connotation frequently 
utters howi\S^ata is to be used. In this sense the ontology develops into a security policy. Establishing a 




risk majTsg^fcnt process over the effectiveness and efficiency of the security controls has to be done. This 
jn effort-consuming intrusion, especially for large organizations, which has not yet been 
rely assisted by automated processes. Deem a patient ontology for healthcare applications. Only 
adT^^strators may access the healthcare records of the patient ontology. That is, different users are 
granted access to different parts of the ontology. We have used languages such as XML (extensible Markup 
Language), OWL to specify such security policies. 

II. Related Work 

The Onion system [9] was born as an attempt to reconcile Ontologies underlying different biological information 
systems. However, Onion is aimed at merging fully fledged, competing Ontologies rather than at enriching and 
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developing an initial ontology based on emerging domain knowledge. The FCAM ERGE[4] technique is much closer to 
ours, inasmuch it follows a bottom-up approach, extracting instances from a given set of domain-specific text 
documents by applying natural language processing techniques. Based on the extracted instances, classical 
mathematical techniques like Formal Concept Analysis (FCA) are then applied to derive a lattice of concepts to be 
later manually transformed into ontology. Focusing on feature taxonomies, Gupta et al. recently proposed a bottom- 
up learning procedure called TAXIND [U\ for learning taxonomies. An important contribution toward bridging the 
gap between conventional text-retrieval and structure-aware techniques was given by Bordogna and Pasi [E], whose 
work, however, does not specifically address W eb and XM L data. 

Well-known research approaches to ontology learning include the Pattern-based extraction, Conceptual cti 
Association rules mining, Ontology extraction and merging. Approaches to content-based clustering 
details) can be classified as Hierarchical Algorithms, Iterative Algorithms, M eta- Search Algorithms 

III. Problem Description 

The project is all about crafting Ontology for the security of the domain of ou 
way. First we collect XML documents (representing domain knowledge) rel 
Security issues, Algorithms, Policies. Classify the gathered documents usin 





Here we adopt both structure as well as content based clustering teohwqves the classification of the 



semi-automatic 
security aspects like 
clustering techniques. 

documents. This will help for a generating a better taxonomy with ctefl^|ilstics like cardinality, sub-class 
of relations, part-of relations. Using these clustered documents g^»3^ ontology metadata suitable for 
enriching and updating an existing ontology. Next we design Onto^y^for Health Care System. Merging of 
the above-formed 0 ntologies gives Secured 0 ntology for the H e^^Care System. 

Formation of a fuzzy bag for each and every input XMJs»^Wument is presented here. Also clustering of 
these fuzzy bags based on their structure is also being dfe^while traditional ontology learning approaches 
focus on hyperonymic and association relations, oujywhnique takes advantage of structural information, 
including the elements' cardinality. In parti cul«t^wir approach is aimed at detecting four ontological 
relations which are subclass-of, value restricti^carainality restriction, and Associated-to. 




IV Arch 



re of the Proposed Work 



Figure 1 shows the architecture^ whr^iVontains three major steps for building taxonomy of the domain 
knowledge. Forming Fuzzy bag/^wrforming structural based clustering and content based clustering. XM L 
documents are composed oHjSs«iuence of nested elements (possibly repeated) called tags (xi), each 
containing another tag, aa^l^or both of them. In our approach we only take into consideration tag names 
and not their value. Thajl^documents are nested representation of data; we can use Fuzzy techniques to 
encode a nested obja^ructure in a flat representation, keeping some memory of the structure in the 
membership val^e^^e first step of our fuzzy extraction process consists of encoding XML documents, (i.e. 
instances of ouy^Cftified XM L data model) as fuzzy multi-sets, i.e. fuzzy bags. 
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A fuzzy bag is a collection of elements with multiple occurrences and having degrees of membership. Bags 
and fuzzy sets can be viewed as particular cases of fuzzy bags. Fuzzy bags are a straightforward extension of 
fuzzy sets, where each element can have multiple instances. In a fuzzy set each element is associated to a 
membership value. For example, a fuzzy bag can be obtained when some attributes are removed from a 
fuzzy set of topples. This is illustrated by the query: find the salaries of young employees which requires a 
projection (salary) of a fuzzy set of persons (the young employees) and delivers a fuzzy bag. 

The encoding of an XM L tree as a fuzzy bag can be performed as follows: supposing that every tag xi has 
associated a membership degree u(xi), initially equal to l(u(xi) =]), we divide every membership degr^Dy 
its nesting level L, obtaining a lossy representation of the tree, taking into consideration theoriginall&^ng 
level of tags, but not the brotherhood relations. The membership value is an index of the posifron^jr" the 
element in the document; it is used to keep memory of the document structure and the nestuw^l of the 
tag. This process is called encoding process: many encoding techniques can be used; the siw||j£j$ne we just 
described is called flattening. In order to compare the fuzzy bags representing 1#Le XfcJL documents 
belonging to the data flow, we have used a measure of resemblance: * 

S (A, B) =M (A /B) / M (A / B) [M : M odulus, A, B: Fuzzy Bags] (!) jfa 



Similarity values obtained from the comparison of this subset of docutfyatfs^rre then used to populate a 
matrix. Here, an a-cut has been performed and original fuzzy valuqs*J^e/)een replaced by crisp value 1 
where the similarity value is higher than threshold a. we compui^^oTlection of blocks composed of 
documents close to a pattern, again based on similarity. ThougiAu^blocks are not the result of proper 
clustering, since the chosen similarity measure lacks the matheiraj^al property of a distance, we shall call 
them clusters in the following. Intuitively, patterns within^yvaja cluster are more similar to each other 
than they are to a pattern belonging to a different cluster^ 




It is necessary to extract a simple description of eadP^ster. This process of data abstraction produces a 
compact description of each cluster in terms of ^^»r prototypes or representative patterns, that we call 
cluster-heads. A good candidate to be dusteiy^adshould be the smallest fuzzy bag in the cluster (i.e. the 
one with the smallest number of elemerfft^hose membership is greater than 0) or, using a (non 
symmetrical) measures of inclusion, the Jwwiore included in each other bag belonging to the same cluster. 
The latter generates a new fuzzy baaA\t«r union of all fuzzy bags in the cluster. This way, a cluster head 
does not necessarily coincide withS^eal data item since each tag is taken with the cardinality and 
membership degree it has in th&^g wnere it appears with the highest cardinality. 



0 



V. Implementation 



The algorithms and ploAslures we followed to perform the above-specified tasks are being explained below 
in detail: 

Nesting AlgortQ^Fuzzy Bag Generator 
. /lXc&ttr 



• /Ttujlft the parameters Text bag, Fuzzy bag, Lroot. 



op thefollowing steps until all the tags of theXM Ldoc are traversed. 
*j Create fuzzy element to store the Tags and its nesting level (weight). 

Membership=weight/Lroot [ M = V / L ], where Lroot =lwhen we are at the second nesting level of 
theXM Ldocument 

Increment the Lroot value by one until the loop ends. 



I ntersection of two fuzzy bags 

• Take input as two Fuzzy bags. 
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• Extract the elements from the two bags which are in common, stored in temporary bag called 
result. 

• Loop until the end of result bag elements are traversed; 

• Store weights of elements of two input bags in two lists and sort them. 

• Obtain the intersection of weights of two lists and storeit in weights Intersection. 

• Traverse through the passed in lists and obtain a third list whose size is the size of the shorter list 
and whose element 'i' is equals to the minimum between weightsBaglget(i) and weightsBag2.get(i). 

• End the loop and return the result. 

• Given for example the lists Ll=[. 5, .3] and L2 = [. 9, .2, .1] the result will be L3=[.5,.2]. ♦ 
Union of two fuzzy bags. 

<<zr 

t^afl in^Vei 



Take input as two Fuzzy bags. 
Extract the elements from the two bags and perform union and store tlae>fl iixTemporary bag 



called result. 

• Loop until the end of result bag elements are traversed; /»N^ 

• If the element is present in only one of the bags just copy it into the 

• If the element found in two bags do 

• Store the weights of elements of two bags in two lists ano>c^\jfe union, which is stored in 
UnionW eights. ^^O^ 

• Traverses the two lists (which must be sorted in descendiffigVrtler) and return a third list whose 
size is the size of the longer list and whose demfftSns equal to the maximum between 
weightsBaglget(i) and weightsBag2.get(il. * 

• End if, end Loop and return the result. 

• Given for example the lists Ll= [.5, .3] and L2.JVT*, .1] the result will be L3=[.9, .3, .1] 



Division of two fuzzy bags. 




Take input as two Fuzzy bags 

A fuzzy number which represent^mrfbership value and multiplicity. 
Compute union on themenabjrll^valuesand store it as union. 
Sort the above list in anyorX^ 
Get division multiplicitwSar multl=(union, fj and mult 2 =(unon,f 2 ). 
Loop until the end oLupkojrlist; 
Get the members£jJQ|Jues in to m x and m 2 . 
If one of theelerfer^ is null do not add the element to the number. 
Else union.gef/|Walue (mi) / Value (m 2 ). 



bise umon.qef ( 
J accord Norm Aj3xfFnm. 



/^^^computes the following norm: S (A, B) = | intersection (A, B)| / |union (A, B)|. WhereA, B 



gorithm is used to compute he similarity value between two fuzzy bags. Given two bags A 



two fuzzy bags and S is similarity value between them. 
Use Average approximation algorithm to round off the float digit obtained while computing the 
division of two membership values. 



K-means Clustering Algorithm. 

• Choose an initial cluster and drop the first bag in to it. 

• Compute the cluster head, which is equal to the union of member fuzzy bags. 
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• Compute the similarity value between cluster heads and one un-clustered bag by using Jaccord 
Norm. 

• Choose an initial alpha cut value. 

• If the similarity value is greater than alpha drop it into the same list otherwise create a new 
cluster for the bag. 

• Repeat the above procedure for different alpha values. 

• Choose the most suitable cluster from these to further processing. 



VI. Simulation Results 



The simulation of the algorithms presented in the design part produce two major outputs. ^B£«is the 
creation of a Fuzzy Bag for each input XM L document and the other is a Structure based Ottered fuzzy 
Bags. These are shown in the following figure which is produced by the tool OntoExtractorif 




uzzy Bag Representations; 2 b. Structural Clusters Generated With different alpha values 

l^tharTgures 2 a and 2 b., the list of tags with their respective weights are being displayed. Also the 
docVy/ent is being displayed in the form of tree. We perform clustering for different alpha cut-off values so 
that we were not restricted to choose an initial cluster of fuzzy bags for further processing. 



VII. Conclusion and Future Work 



Even though a mammoth literature on information extraction is available, developing and maintaining 
ontology-based metadata is still more an art than a science. Speedy evolution of obtainable information is 
difficult to control and often potentially useful, emerging concepts linger disregarded. This project 
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addressed this predicament by a bottom-up, semiautomatic fuzzy technique for generating 0 ntologies from 
semi structured dataflows. 

Further clustering the bags based on their content helps in engendering the classes organized in to 
hierarchy, which further can be deciphered in to 0 W L standard knowledge representation format by taking 
each concept as a candidate class connected by a subclass-of relation to all concepts included in it. 
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